Effective Use of Discontinuous Phrases for Hierarchical Phrase-based Translation

نویسندگان

  • Wei Wei
  • Bo Xu
چکیده

Hierarchical phrase-based (HPB) models have shown strong capability in generalization and reordering. However, they are heavily dependent on continuous phrases and are difficult for modeling natural linguistic discontinuities directly. In this paper, we propose a novel approach for integrating discontinuous phrases into the Chinese-to-English HPB system. We focus on the extraction method of discontinuous phrases which retrieves various linguistic information missed in the HPB model, such as set phrases and long-distance reordering of adverbials, etc. After being transformed into the similar form to HPB rules, the translation rules with discontinuities can be seamlessly integrated into the CKY decoder. Experimental results show that the proposed approach for incorporating the linguistic discontinuities achieves statistically significant improvements over the traditional HPB system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Source-Side Discontinuous Phrases for Machine Translation: A Comparative Study on Phrase Extraction and Search

Standard phrase-based statistical machine translation systems generate translations based on an inventory of continuous bilingual phrases. In this work, we extend a phrase-based decoder with the ability to make use of phrases that are discontinuous in the source part. Our dynamic programming beam search algorithm supports separate pruning of coverage hypotheses per cardinality and of lexical hy...

متن کامل

Accurate Non-Hierarchical Phrase-Based Translation

A principal weakness of conventional (i.e., non-hierarchical) phrase-based statistical machine translation is that it can only exploit continuous phrases. In this paper, we extend phrase-based decoding to allow both source and target phrasal discontinuities, which provide better generalization on unseen data and yield significant improvements to a standard phrase-based system (Moses). More inte...

متن کامل

A Generalized Reordering Model for Phrase-Based Statistical Machine Translation

Phrase-based translation models are widely studied in statistical machine translation (SMT). However, the existing phrase-based translation models either can not deal with non-contiguous phrases or reorder phrases only by the rules without an effective reordering model. In this paper, we propose a generalized reordering model (GREM) for phrase-based statistical machine translation, which is not...

متن کامل

Offline Extraction of Overlapping Phrases for Hierarchical Phrase-Based Translation

Standard SMT decoders operate by translating disjoint spans of input words, thus discarding information in form of overlapping phrases that is present at phrase extraction time. The use of overlapping phrases in translation may enhance fluency in positions that would otherwise be phrase boundaries, they may provide additional statistical support for long and rare phrases, and they may generate ...

متن کامل

Analysing soft syntax features and heuristics for hierarchical phrase based machine translation

Similar to phrase-based machine translation, hierarchical systems produce a large proportion of phrases, most of which are supposedly junk and useless for the actual translation. For the hierarchical case, however, the amount of extracted rules is an order of magnitude bigger. In this paper, we investigate several soft constraints in the extraction of hierarchical phrases and whether these help...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011